|
The Gittins index is a measure of the reward that can be achieved by a random process bearing a termination state and evolving from its present state onward, under the option of terminating the said process at every later stage with the accrual of the probabilistic expected reward from that stage up to the attainment of its termination state. It is a real scalar value associated to the state of a stochastic process with a reward function and a probability of termination. ==Terminology== To illustrate the theory we can take two examples from a developing sector, such as from electricity generating technologies: wind power and wave power. If we are presented with the two technologies when they are both proposed as ideas we cannot say which will be better in the long run as we have no data, as yet, to base our judgments on. It would be easy to say that wave power would be too problematic to develop as it seems easier to put up lots of wind turbines than to make the long floating generators, tow them out to sea and lay the cables necessary. If we were to make a judgment call at that early time in development we could be condemning one technology to being put on the shelf and the other would be developed and put into operation. If we develop both technologies we would be able to make a judgment call on each by comparing the progress of each technology at a set time interval such as every three months. The decisions we make about investment in the next stage would be based on those results.〔 In a paper in 1979 called ''Bandit Processes and Dynamic Allocation Indices'' John C. Gittins suggests a solution for problems such as this. He takes the two basic functions of a "Scheduling Problem" and a "Multi-armed bandit" problem〔J. C. Gittins, ''(Bandit Processes and Dynamic Allocation Indices )'', Journal of the Royal Statistical Society, Series B, Vol. 41, No. 2. (1979), pp. 148–177.〕 and shows how these problems can be solved using ''Dynamic allocation indices''. He first takes the "Scheduling Problem" and reduces it to a machine which has to perform jobs and has a set time period, every hour or day for example, to finish each job in. The machine is given a reward value, based on finishing or not within the time period, and a probability value of whether it will finish or not for each job is calculated. The problem is "to decide which job to process next at each stage so as to maximize the total expected reward."〔 He then moves on to the "Multi–armed bandit problem" where each pull on a "one armed bandit" lever is allocated a reward function for a successful pull, and a zero reward for an unsuccessful pull. The sequence of successes forms a Bernoulli process and has an unknown probability of success. There are multiple "bandits" and the distribution of successful pulls is calculated and different for each machine. Gittins states that the problem here is "to decide which arm to pull next at each stage so as to maximize the total expected reward from an infinite sequence of pulls."〔 Gittins says that "Both the problems described above involve a sequence of decisions, each of which is based on more information than its predecessors, and these both problems may be tackled by dynamic allocation indices."〔 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Gittins index」の詳細全文を読む スポンサード リンク
|